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We sampled Thaumarchaeota populations in the northern Gulf of Mexico, including shelf 
waters under the Mississippi River outflow plunne that are subject to recurrent hypoxia. 
Data from this study allowed us to: (1) test the hypothesis that Thaumarchaeota would 
be abundant in this region; (2) assess phylogenetic composition of these populations for 
comparison with other regions; (3) compare the efficacy of quantitative PGR (qPGR) based 
on primers for 16S rRNA genes (rrs) with primers for genes in the ammonia oxidation 
(amoA) and carbon fixation [accA, hcd) pathways; (4) compare distributions obtained by 
qPGR with the relative abundance of Thaumarchaeota rrs in pyrosequenced libraries; (5) 
compare Thaumarchaeota distributions with environmental variables to help us elucidate 
the factors responsible for the distributions; (6) compare the distnbution of Thaumarchaeota 
with Nitrite-Oxidizing Bacteria (NOB) to gain insight into the coupling between ammonia 
and nitrite oxidation. We found up to 10^ copies L^^ of Thaumarchaeota rrs in our sam- 
ples (up to 40% of prokaryotes) by qPGR, with maximum abundance in slope waters at 
200-800 m. Thaumarchaeota rrs were also abundant in pyrosequenced libraries and their 
relative abundance correlated well with values determined by qPGR {r^ = 0.82). Thaumar- 
chaeota populations were strongly stratified by depth. Ganonical correspondence analysis 
using a suite of environmental variables explained 92% of the variance in qPCR-estimated 
gene abundances. Thaumarchaeota rrs abundance was correlated with salinity and depth, 
while accA abundance correlated with fluorescence and pH. Gorrelations of Archaeal 
amoA abundance with environmental variables were primer-dependent, suggesting dif- 
ferential responses of sub-populations to environmental variables. Bacterial amoA was at 
the limit of qPGR detection in most samples. NOB and Euryarchaeota rrs were found in 
the pyrosequenced libraries; NOB distribution was correlated with that of Thaumarchaeota 
(r2 = 0.49). 



Keywords: thaumarchaeota, euryarchaeota, nitrite-oxidizing Bacteria, hypoxia. Gulf of IVIexico, ammonia monooxy- 
genase, acetyl-CoA/propionyl-CoA carboxylase, 4-hydroxybutyryl-CoA dehydratase 



INTRODUCTION 

The Mississippi River outflow forms a surface plume up to 10 m 
thick upon entering the northern Gulf of Mexico. Stratifica- 
tion and nutrient (especially nitrogen) enrichment of river water 
(Turner et al, 2006) lead to elevated primary production in the 
plume and thus to increased organic matter deposition 10 to 
ICQ km away from river discharge sites (Rabalais et al., 2002; Green 
et al., 2008). Decomposition of this organic matter is thought 
to contribute to the formation of a recurrent hypoxic zone in 
the northern Gulf of Mexico that profoundly affects the ecology, 
fisheries biology, and geochemistry of the region (Rabalais et al, 
2002; Dagg et al, 2007; Cai et al, 2011). Intermittent hypoxia 
([O2] <2mL/L or ~90 (iM; Diaz and Rosenberg, 2008) begins 
to develop in February and typically is most pronounced from 
mid-May to mid-September (Rabalais et al., 2010). 

Processes such as coupled nitrification/denitrification that 
remove excess fixed nitrogen affect primary production and thus 
may be important determinants of the extent and duration of 



hypoxia. Ammonia oxidation is the first step in the biogeochemical 
pathway leading to denitrification. Members of the P- and y- 
subdivisions of the Proteobacteria (Ammonia-Oxidizing Bacte- 
ria, AOB) and Marine Group 1 Archaea (Ammonia-Oxidizing 
Archaea, AOA) can grow chemoautotrophically by oxidizing 
ammonia to nitrite (Ward, 2011). The nitrite produced can be 
oxidized further to nitrate by Nitrite-Oxidizing Bacteria (NOB) 
and then denitrified (letten, 2001; Francis et al, 2007; Ward et al, 
2009). 

Ammonia monooxygenase genes (amoA) fi-om AOA have been 
observed in marine environments at 10-1,000 times greater abun- 
dance than the amoA homolog from AOB, suggesting that the 
AOA play a key role in the marine nitrogen cycle (Francis et al, 
2005, 2007; Wuchter et al., 2006; Mincer et al, 2007; Prosser and 
Nicol, 2008; Santoro et al., 2010; Ward, 2011). Currently, the func- 
tional guild of marine AOA includes members of the Marine 
Group 1 Archaea (DeLong, 1992; Fuhrman et al, 1992) and organ- 
isms related to a deeply branching clade (pSL12) of hot-spring 
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crenarchaeotes (Barns et al., 1996) that are predicted to possess 
the amoA gene (Mincer et al., 2007). Genomic evidence suggests 
that Marine Group 1 Archaea and related organisms from benthic, 
terrestrial, and hot-spring habitats, as well as a sponge symbiont, 
should be assigned to a new phylum, the Thaumarchaeota, within 
the kingdom Archaea (Brochier-Armanet et al., 2008; Spang et al, 
2010; Kelly et al., 2011). We use this term hereinafter in place of 
"Marine Group 1 Archaea." 

Pelagic marine Thaumarchaeota are typically most abun- 
dant below ~100m depth in the water column (DeLong, 1992; 
Fuhrman et al, 1992; Massana et al, 1997; Karner et al, 2001; 
Mincer et al., 2007; Church et al, 2010; Santoro et al, 2010), in 
surface waters at higher latitudes and polar oceans (Massana et al., 
1998; Murray et al, 1998, 1999b; Church et al, 2003; Alonso-Saez 
et al., 2008; Kalanetra et al., 2009), and in hypoxic regions and oxy- 
gen minimum zones (OMZs; [O2] < 0.5 mL/L or <22 |xM; Levin, 
2003) such as the Black Sea, Baltic Sea, Gulf of California, Arabian 
Sea, and the eastern tropical Pacific Ocean (Coolen et al., 2007; 
Lam et al, 2007, 2009; Beman et al., 2008; Labrenz et al, 2010; 
Molina et al, 2010). Previous studies are contradictory but have 
pointed to environmental factors such as salinity, light, temper- 
ature, ammonium, oxygen, and sulfide as major determinants of 
this distribution (e.g., Murray et al., 1999a; Caffrey et al, 2007; 
Santoro et al., 2008; Bernhard et al., 2010; Gubry-Rangin et al., 
2010; reviewed in Prosser and Nicol, 2008; Erguder et al, 2009; 
Nicol et al., 20 1 1; Ward, 20 1 1 ) . Bacterial or phytoplankton biomass 
has also been thought to influence Thaumarchaeota distributions 
(Murray et al, 1999a,b; Church et al., 2003), perhaps through 
competition for resources. 

One of the goals of the present study was to quantify the 
distribution of AOA in the northern Gulf of Mexico in the 
area influenced by the Mississippi River plume and recurrent 
hypoxia. We hypothesized that ammonia oxidizers would be abun- 
dant there because of the high riverine nitrogen loading to the 
region and the importance of respiration (Cai et al., 2011), and 
thus presumably nitrogen regeneration, in the region experienc- 
ing hypoxia. We also hypothesized that AOA would dominate 
ammonia oxidizer populations at pelagic stations, although AOB 
were found to be more abundant than AOA in sediments from 
Weeks Bay, Alabama (Caffrey et al., 2007). To test these hypothe- 
ses, we determined AOA and AOB distributions by quantitative 
PGR (qPGR) measurements of the abundance of rrs and amoA 
genes. We also pyrosequenced rrs genes from our samples as 
an independent check on distributions based on qPCR data. A 
second goal was to analyze variation in sequences of rrs and 
compare this to genes from two metabolic pathways that are 
important to AOA, ammonia oxidation and carbon fixation, to 
provide a more highly resolved description of the composition of 
Thaumarchaeota populations than can be obtained from analyses 
of single genes. AOA can grow autotrophically (Konneke et al., 
2005) using the 3-hydroxypropionate/4-hydroxybutyrate pathway 
(Berg et al, 2007). The potential for AOA autotrophy can be 
detected in the environment using primers targeting the genes 
in this pathway, notably acetyl-CoA/propionyl-CoA carboxylase 
(flccA; Yakimov et al., 2009) and 4-hydroxybutyryl-CoA dehy- 
dratase (hcd; Offre et al, 2011). We tested both of these primer 
sets with our samples. We compared the phylogenetic diversity 



present in their amplicons with diversity represented in ampli- 
cons from more widely used primer sets for amoA and rrs. We 
then used rrs sequences from the pyrosequencing effort to extend 
phylogenetic inferences based on analyses from samples taken 
at one station more broadly across the study area. A third goal 
was to investigate the relationship between Thaumarchaeota dis- 
tributions and environmental variables to provide insight into 
the factors controlling their distribution. Pyrosequencing data 
were also used to compare the distribution of NOB with AOA 
to gain insights into the coupling between these two steps of 
nitrification. 

MATERIALS AND METHODS 

SAMPLE COLLECTION AND DNA EXTRACTION 

Samples were collected during the R/V Cape Hatteras GulfCar- 
bon 5 cruise in the northern Gulf of Mexico (30°07'N, 088°02'W 
to 27°39'N, 093°39'W; Figure 1) from March 10-21, 2010. Sam- 
ples were collected using Niskin bottles and a General Oceanics 
rosette sampling system equipped with an SBE25 CTD and sensors 
for [O2], beam attenuation (turbidity), and relative fluorescence 
(calibrated to chlorophyll a equivalents). The [O2] sensor was 
cross-calibrated against Winkler titrations of [O2] in samples 
collected at fixed depths. pH data were collected using a glass 
electrode by W.-J. Huang of Dr. W.-J. Cai's group. Euphotic depth 
(defined as 1% PAR, 400-700 nm) was calculated for each sta- 
tion from Aqua MODIS satellite data using an average of the Lee 
and Morel models' by H. Reader and C. Fichot. Nutrient data 
were collected at some of the station/depths we sampled by Dr. S. 
Lohrenz's group. Since nutrient sample collections were biased in 
favor of near-surface samples on the continental shelf, these data 
were used only in BEST analysis (see Appendix) . Approximately 1 L 
of water from each Niskin bottle was pressure filtered (at ~60 kPa) 
through 0.22 \im Durapore filters (Millipore); filters were frozen 
in 2 mL of lysis buffer (0.75 M sucrose, 40 mM EDTA, 50 mM 
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FIGURE 1 I Stations occupied during the GulfCarbon 5 cruise, IVIarcPi 
10-21, 2010. Inshore stations represented with a filled star; offshore 
stations have an open star. 
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Tris; pH 8.3). DNA was extracted by enzymatic hydrolysis with 

lysozyme (50 mgmL^'), proteinase K (20 mg mL^'), and sodium 
dodecyl sulfate (100 [iL of a 10% solution), and then purified by 
phenol-chloroform extraction as described previously (Bano and 
HoUibaugh, 2000). 

QUANTITATIVE PGR 

Quantitative PGR was performed using an iCycler iQ™Real- 
Time qPCR detection system (Bio-Rad) and the primers listed in 
Table Al in Appendix. qPCR reactions were run in triplicate with 
standards made from environmental amplicons as described in the 
"Methods" in Appendix. TaqMan® (Applied Biosystems) chem- 
istry was used to detect amplification of Bacteria and Thaumar- 
chaeota 16S rRNA genes (rrs) following Kalanetra et al. (2009); all 
other amplifications were detected using SYBR® Green Supermix 
(Bio-Rad). We compared two primer sets for detecting Archaeal 
amoA: Arch-amoA-for and Arch-amoA-rev ("Wuchter primers"; 
Wuchter et al, 2006) and ArchamoAF and ArchamoAR ("Fran- 
cis primers"; Francis et al, 2005). Reactions using the Wuchter 
primers were set up as described in Kalanetra et al. (2009), while 
PGR conditions for the Francis primers followed Santoro et al. 
(2010), except that SYBR® Green Supermix (Bio-Rad) was used 
with no additional MgGl2. Amplification of pSL12 rrs followed 
Mincer et al. (20071, with the number of amplification cycles 
reduced to 40 to prevent quenching of the fluorescence signal. 
Archaeal accA genes were amplified following Yakimov et al. 
(2009) with shorter cycle lengths (Hu et al., 2011). Specificity of 
SYBR® Green reactions was confirmed by melting curve analysis; 
accA amplicons were also checked by sequencing clones created 
with qPGR primers Grena_529F and Grena_981R (Yakimov et al., 
2009). We also tested published primers for hcd genes (Offre et al., 
2011), but found that non-specific amplification rendered them 
unsuitable for qPGR with our samples (see Appendix). Inhibition 
of qPCR reactions was tested using dilutions of DNA 10-l,000x 
with the Bacterial rrs qPCR assay; samples that showed higher copy 
number than expected from typical dilution were determined to 
have PGR inhibitors present and run at the dilution which gave the 
highest copy number for all other gene assays. Galculations of gene 
abundance and ratios are discussed in the "Methods" in Appendix, 
and qPGR efficiencies for reactions are reported in Table Al in 
Appendix. 

PHYLOGENETIC ANALYSIS 

We sequenced cloned rrs, amoA, and accA amplicons to obtain 
phylogenetic descriptions of the Thaumarchaeota populations in 
the study area and to verify specificity of qPGR reactions. Libraries 
were generated from samples collected at Station D5, located on 
the southern edge of the area influenced by the Mississippi River 
plume and over the continental slope (Figure 1) using meth- 
ods described previously (Kalanetra et al., 2009) and summarized 
below. This station was chosen for its depth and as representative of 
slope stations influenced by hypoxia. We compared samples from 
different depths at this station as others (e.g., Lam et al., 2007; 
Beman et al, 2008; Kalanetr ^ ' """"; Ciinrch ' "" ■0; 
Santoro et al., 2010) have shown segregation of Thaumarchaeota 
populations by depth, rrs and amoA were amplified from DNA 
collected at 100 and 200 m, while accA amplicons were generated 



from samples collected at 2, 50, 100, 200, and 450 m to test the 
accA primer set across a wider depth range. PGR amplifications of 
Archaeal rrs, amoA, and accA used the primers listed in Table Al 
in Appendix. Three separate amplifications were pooled to mini- 
mize potential PGR bias and electrophoresed on a 1% agarose gel. 
The band of the expected DNA product size was excised, extracted 
and purified using the QIAquick® Gel Extraction Kit (QIAGEN), 
and incorporated into a TOPO 4 vector (Invitrogen) prior to 
cloning using chemically competent TOP 10 E. colt cells with the 
TOPO TA cloning kit (Invitrogen) following the manufacturer's 
instructions. Glones from each library were selected randomly and 
sequenced (Genewiz, Inc.) using the plasmid primer M13F(— 21). 
Euryarchaeota rrs sequences were identified by BLAST (Zhang 
et al., 2000) and not analyzed further. 

Sequences were inspected manually and checked for vec- 
tor contamination using Geneious v. 5.41^. Thaumarchaeota rrs 
sequences were checked for chimeras using Bellerophon (Huber 
ct al., 2004); three chimeric sequences were identified and dis- 
carded. Nucleotide and inferred amino acid sequences for amoA 
and accA were aligned in Geneious, while rrs nucleotide sequences 
were first aligned using the Silva aligner (v.1.2.5; Pruesse et al, 
2007) and then imported into ARB (v. 5.2; Ludwig et al., 2004), 
manually trimmed, and inspected for alignment errors. Sequences 
obtained from these libraries have been deposited in GenBank 
(NGBI) under accession numbers KG330756 to KG330822 (rrs - 
Thaumarchaeota, n = 67), KG330823 to KC330871 {rrs - Eur- 
yarchaeota, n = 49), KG349137 to KG349317 {amoA, n = 181),and 
KG349318 to KG349551 {accA, n = 234). 

Operational taxonomic units (OTUs) were determined from 
sequence alignments using mothur (v. 1.21.1; Schloss et al., 2009) 
with cutoffs of 0.02 (>98% similarity) for Thaumarchaeota rrs 
and 0.03 (>97% similarity) for Archaeal amoA and accA. Diver- 
sity indices and richness estimates (Shannon, Simpson, Ghao, and 
AGE) were calculated in mothur. Neighbor-joining trees were con- 
structed using ARB (Liid^\dg ct al., 2004) with the Jukes-Gantor 
correction and 1,000 bootstrap resamplings for nucleotide trees; 
protein trees were constructed without the Kimura correction and 
re-sampled 100 times. Trees were edited using FigTree (v. 1.3.1)^. 

PYROSEQUENCING ANALYSES 

We also analyzed the distribution of ribotypes in 41 of our 
52 samples by massively parallel sequencing (pyrosequencing) 
using a Roche 454/FLX instrument running Titanium chem- 
istry, rrs in DNA extracted from our samples were amplified by 
PGR using universal rrs primers 515F and 806R (Table Al in 
Appendix), modified for bar-coded pyrosequencing. PGR pro- 
tocols and primer sequences, including barcodes, adaptors, and 
linkers, followed Ikitcs et al. (2011). Purified DNA from three 
reactions for each sample was pooled to produce a mixture in 
which amplicons from each sample were represented equally. The 
final mixture was sequenced using standard protocols by Engen- 
core (University of South Garolina, Golumbia, SG, USA) . Sequence 
data have been deposited with MG-RAST* at accession numbers 
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4509220.3-4509263.3. Metadata are available via the project page: 
"Analysis of composition and structure of coastal to mesopelagic 
bacterioplankton communities in the nGoM." 

A total of 435,290 sequences were filtered and trimmed (mini- 
mum length 200 bp, minimum quality score 20; 22 1 ,4 1 0 sequences 
passed) and then sorted into OTUs using the PANGEA pipeline 
(Giongo et al., 2010). Phylogenetic affiliations of these sequences 
were determined by a megablast analysis using a reference set 
of more than 170,000 rrs sequences from described isolates 
obtained from the RDP II database (Giongo et al., 2010). Ampli- 
con sequences were binned into OTUs at domain, phylum, class, 
order, family, genus, and species levels based on megablast results, 
and then grouped into phylogenetic clusters and sorted by station 
and depth (average number of sequences per sample: 5,400; range 
764-9,176). The PANGEA pipeline assigns all Archaea sequences 
to one group that also includes divergent Bacteria sequences. In 
order to more accurately assess the proportion of Thaumarchaeota 
in our samples, we manually enumerated hits to Thaumarchaeota 
in the megablast output for each sample. We also counted hits to 
known AOB, NOB, and Euryarchaeota. 

Thaumarchaeota rrs sequences obtained from pyro sequencing 
were included for phylogenetic analysis using mothur (v. 1.21.1; 
Schloss et al, 2009). Unique sequences were grouped together and 
aligned against the Silva Archaea reference database^ . The resulting 
alignment, including rrs sequences from Station D5 clone libraries 
and outgroups, was trimmed to a set length and eight chimeric 
sequences were removed with Uchime (Edgar et al., 2011); addi- 
tional potential chimeras and erroneous sequences were checked 
manually using BLAST and removed if necessary. The remaining 
23,677 Thaumarchaeota sequences were clustered and represen- 
tatives from each OTU obtained. A maximum likelihood tree 
was constructing using representative sequences grouped at 98% 
similarity (2,772 sequences total) with the RAxML program (Sta- 
matakis et al, 2005) within ARB (Ludwig et al, 2004); 100 trees 
were generated using rapid bootstrap analysis, and the consensus 
tree was constructed from these iterations. Rarefaction analysis 



^http://www.mot}iur.org/wiki/Silva_reference_files 



was completed using mothur as described for clone library sam- 
ples above. The Bacteria populations of these samples are analyzed 
in King et al. (2013). 

STATISTICAL ANALYSES 

Model II ordinary least squares pairwise regressions were cal- 
culated following Legendre and Legendre (1998) using software 
available at the R- Project web site^. Coefficients of determina- 
tion and confidence limits of regression equations were calculated 
from 999 bootstrap permutations. PRIMER (v.6; Clarke and Gor- 
ley, 2006) was used to compare environmental and biological data 
from each station. We normalized environmental data in PRIMER 
to reduce the influence of variable unit scales before principal 
components analysis (PCA). The software package CANOCO (v. 
4.5; ter Braak and Smilauer, 2002) was used for canonical cor- 
respondence analysis (CCA; ter Braak, 1986) using PCA values 
and log- transformed qPCR gene abundances. Significance of CCA 
was determined using 499 Monte-Carlo permutations (reduced 
model) as recommended in the program documentation. The 
RAxML tree constructed from 454-generated Thaumarchaeota 
rrs sequences was used in Fast UniFrac (Hamady et al., 2009) 
to investigate phylogenetic patterns by sample location and depth. 
Weighted abundances of sequences within samples were used in 
both Principal Coordinates Analysis (PCoA) and sample cluster- 
ing, as well as to calculate pairwise Unifrac distances. Counts 
were normalized to reduce the influence of larger sample sizes 
(greater number of sequences) at certain stations. The significance 
of sample clusters was tested using 100 jackknife permutations and 
resampling of the minimum (2), first quartile (100), or median 
(520) number of sequences across all samples; any sample contain- 
ing less than the number of re-sampled sequences was eliminated 
from the analysis. 

RESULTS 

GENE ABUNDANCE AND DISTRIBUTION 

The abundance of Bacterial rr5 in these samples ranged from 10^ to 
lO'" copies L^' (Table 1; Table A2 in Appendix). Thaumarchaeota 
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Table 1 | Summary of qPCR-estimated gene abundances (copies L ^ ) determined for samples from the northern Gulf of Mexico. 





Thaum. 


Thaum. 


Thaum. 


Thaum. 


pSL12 


AOB 


Bacteria 




W amoA 


F amoA 


accA 


rrs 


rrs 


amoA 


rrs 


Near-surface 


3.86 X 10' 


5.82 X 10^ 


1.29 X 10^ 


1.85 X 10' 


4.77 X 10" 


3.67 X 10= 


3.20 X 10^ 


inshore 


(9.74 X 10" 


(9.28 X 102 


(9.09 X 102 


(1.37 X 10= 


(1.12 X 10^ 


(6.09 X 10^ 


(3.11 X 10= 




-1.74 X 10^) 


-2.97 X 10^) 


-1.00 X 10^) 


-1.10 X lO*') 


-3.30 X 10=) 


-2.10 X lO'') 


-1.26 X 101°) 


Near-surface 


1.16x10' 


4.19 X 10^ 


5.66 X 10= 


6.95 X 10^ 


1.30 X 10^ 


2.81 X 10^ 


716 X 10^ 


offshore 


(3.91 X 10" 


(1.24 X IQS 


(3.16 X 102 


(4.79 X 10" 


(1.89 X 10^ 


(1.67 X 102 


(3.48 X 10*^ 




-3.29 X 10^) 


-1.33 X 10^) 


-2.79 X 10^) 


-2.14 X 10^) 


-3.98 X 103) 


-707 X 103) 


-1.34 X 109) 


Deep offshore 


3.68 X 10^ 


1.11 X 10' 


8.72 X 10^ 


1.79 X 10' 


1.00 X 10" 


2.93 X 10^ 


2.14 X 10^ 




(4.65 X 10^ 


(5.11 X 10= 


(1.48 X 10= 


(3.23 X 10*5 


(3.52 X 10^ 


(1.34 X 10^ 


(2.49 X 10' 




-2.12 X 10^) 


-5.86 X 10^) 


-1.80 X 10^) 


-5.45 X 10') 


-2.92 X 10") 


-8.80 X 10^) 


-1.83 X 10^) 



Means for each reaction are listed in bold; ranges follow the mean in parentheses. amoA W, amplified with Wuchter et al. (2006) amoA primer set; amoA F, amplified 
with Francis et al. 12005) amoA primer set. "Near-surface " is < TOO m depth; "deep " is > TOO m depth; "inshore," over the continental shelf (seafloor depth < WO m); 
"offshore," shelf break and beyond (depth > 100 m). 
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FIGURE 2 I Depth profiles of the abundance of selected genes 
and of environmental variables at Stations (A) A6, (B) D5, and (C) 

H6. Gene abundances are given as copies L"' of sample filtered as 
determined from triplicate qPCR amplifications of Archaeal and 
p-Proteobacterial amoA and Archaeal accA (left) and Thaumarchaeota, 
pSL12, and Bacterial rrs (center); note that scales for p-Proteobacterial 



amoA and pSL12 rrs are reduced by 10-100 to allow for visualization 
of variation with depth. Environmental data were taken from a CTD 
attached to the frame of the rosette sampler (right). Sampling depths 
are shown as X's on the depth axis; missing points indicate that the 
measurement was below the limit of detection (see Table Al in 
Appendix for detection limits). 



rrs genes were present in the same samples at up to 10^ copies L^^ 
(Table A2 in Appendix) with population maxima occurring typi- 
cally between 100 and 200 m depth and at lower [O2] and temper- 
ature (Figure 2). The abundance of rrs genes attributable to the 



pSL12-lik:e clade was much lower, near the limit of detection (see 
Table Al in Appendix) in most samples with a maximum abun- 
dance of 10^ copies L^' (Table A2 in Appendix). Similar trends 
with depth for pSL12 rrs were observed as Thaumarchaeota rrs, 
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though pSL12 rrs abundance was generally 100- to 10,000-fold 
lower (Figure 2), except in one sample (Station Hl-7m), where 
pSL12 rrs was 10% of Thaumarchaeota rrs. No Thaumarchaeota 
rr5 were detected at the freshwater Mississippi River station (MRl - 
2 m) where pSL12 rrs was present at 10^ copies L"^ (Table A2 in 
Appendix) . 

Thaumarchaeota accomited for a high proportion (up to 40% 
by qPCR, up to 54% of pyrosequenced rrs) of the total prokary- 
otic community in our samples. This percentage varied with 
depth (Figure 3), with deeper (>100m) samples containing an 
average of 21% Thaumarchaeota (range 0.5-40%) while samples 
from near-surface water (<100m) contained only 1.8% Thau- 
marchaeota (range 0-9%). Differences were also observed with 
distance from shore, with shallower (<100m) samples from 
inshore stations having fewer Thaumarchaeota than those from 
offshore stations (1.1 versus 2.8% of prokaryotes, respectively). 
Pyrosequencing also showed that Thaumarchaeota rrs genes were 
most abundant in samples from depths of 1 00-200 m, though they 
were present at low abundances in all samples with the exception 
of MRl-2m (Table A3 in Appendix), in agreement with qPCR 
analyses. Thaumarchaeota accounted for 0.1-54% of the prokary- 
otes in pyrosequencing libraries and their distributions based on 
qPCR estimates of gene abundance compared favorably with the 
contribution of Thaumarchaeota ribotypes to pyrosequenced rrs 
libraries from these samples (Figure 4; model II regression, n = 41 , 

= 0.82, 95% CL of slope = 0.54-0.73). 

Archaeal amoA was present at up to 10^ copies (Table 1; 
Figure 2; Table A2 in Appendix). Bacterial amoA was at the limit 
of detection (Table Al in Appendix) in most samples, with a max- 
imum of 10^ copies L^'. The ratio of AOA:AOB amoA was found 
on average to be 2100:1 (Wuchter primers) to 3300:1 (Francis 
primers). The ratio of Bacterial amoAiBacterial rrs averaged 0.001 
across all samples, with a maximum of 0.05 at Station D3-68 m 
(Figure ASA in Appendix) . Abundances of accA genes ranged from 
the limit of detection (10* copies L^^) to lO'' copies ^Table 1; 
Figure 2; Table A2 in Appendix) . Archaeal amoA (quantified using 
Wuchter primers) showed similar distribution by depth as Thau- 
marchaeota rrs (Figure 2). However, accA abundances showed 
opposite trends with depth, leading to higher ratios of amoAiaccA 
or rrs:accA in near-surface (< 100 m) water (Figure 2; Table 2). 



We used PCA (Figure A3 in Appendix) to identify samples 

from similar environments and group them into a few cate- 
gories to simplify comparisons. The first two PCA axes explained 
63.2% of the variation between samples (Figure A3; Table A5 in 
Appendix), which supported placing stations into three groups: 
near-surface inshore, near-surface offshore, and deep offshore 
sets. CCA was included (Figure 8) to investigate relationships 
between gene abundances and environmental conditions (sim- 
ilar to BEST analysis, see Appendix). The primary CCA axis 
(CCAl) explained 47.9% of the gene abundance-environment 
relationship; adding the second axis (CCA2) increased the variance 
explained by 44% (91.7% total; Figure 8; Table A6 in Appen- 
dix). A global permutation test gave a statistical significance of 
p < 0.05 for station groupings based on both canonical axes con- 
sidered together (F = 2.26, p = 0.014), while CCAl considered 
alone did not explain the gene abundance-environment relation- 
ship (1^ = 8.43, p = 0.086). Thaumarchaeota rrs abundance was 
negatively correlated with most environmental variables, except 
for salinity and depth (Figure 8). Bacterial rrs abundance corre- 
lated positively with euphotic zone depth and had a strong negative 
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FIGURE 4 I Fraction of Thaumarchaeota rrs found in 454 
pyrosequencing libraries versus the fraction of Thaumarchaeota rrs 
determined from qPCR data. Line represents a model II pairwise 
regression. 
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FIGURE 3 I Abundance of Thaumarchaeota as a percentage of total 
bacterioplankton plotted against sample depth. 



Table 2 | Mean and ranges of the ratios of Thaumarchaeota gene 
abundances. 



amoA}N:rrs 



amoA F:irs 



accA-.rrs 



Near-surface 2.5(0.71-6.6) 0.32(0.002-0.69) 0.06(0.001-0.22) 

inshore 

Near-surface 1.2(0.17-1.8) 0.62(0.28-1.9) 0.04(0.0002-0.17) 
offshore 

Deep offshore 0.19(0.001-1.0) 0.57(0.16-1.1) 0.58(0.07-1.3) 

Gene ratios were calculated by dividing the abundance of each of the genes tested 
by the abundance of rrs in the same sample. amoA W, amplified with Wuchter 
et al. (20061 amoA primer set; amoA F, amplified with Francis et al. (2005) amoA 
primer set. 
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correlation with pH, with Httle influence from any variable primar- 
ily contributing to CCA2 (beam attenuation, oxygen; Figure 8). 
The distribution of Archaeal amoA genes as assessed with the 
Wuchter primers, in contrast, was not strongly influenced by vari- 
ables contributing to CCAl (fluorescence, pH, latitude, longitude; 
Figure 8) but showed a weak positive correlation with temperature 
and beam attenuation (turbidity). Archaeal amoA gene abundance 
assessed by the Francis primers showed the opposite trend, with 
strongest positive correlations to latitude (which covaries with dis- 
tance offshore and depth in this region) and oxygen concentrations 
(Figure 8). Bacterial amoA gene abundance correlated with beam 
attenuation (turbidity) and temperature (positive correlation), as 
well as depth (negative correlation). accA gene abundance had 
strong positive correlations with relative fluorescence (chlorophyll 
a equivalents) and pH (Figure 8). 

THAUMARCHAEOTA COMMUNITY COMPOSITION AT STATION 05 

Phylogenetic analysis of 67 Sanger-sequenced Thaumarchaeota 
rrs sequences obtained from 100 and 200 m depth at Station 
D5 revealed 10 different OTUs (Figure 5; Table A4 in Appen- 
dix; 98% similarity cutoff). All but one of the sequences retrieved 
from the 100 m sample clustered into a single OTU (the "Near- 
Surface Group," Figure 5), that also contained one sequence 



retrieved from the 200 m sample and the reference sequence from 
Nitwsopumilus sp. NM25 (AB546961; Matsutani et al., 2011). 
We did not retrieve any sequences related to the marine pSL12- 
like clade. Sequences retrieved from the 200 m sample displayed 
greater richness and evenness (Table A4 in Appendix; 9 OTUs) 
and included some OTUs that appear unique to the northern Gulf 
of Mexico. 

We retrieved 184 amoA sequences from Station D5. Phy- 
logenetic analysis of the translated and aligned amino acid 
sequences revealed two OTUs (similarity cutoff of 97%) of 
AmoA (Figure 6A) : one containing primarily near-surface (100 m) 
sequences ("Group A" following Beman et al, 2008) and the 
other dominated by sequences from 200 m ("Group B"). amoA 
nucleotide sequences also grouped primarily by depth, but with 
greater richness and diversity (Table A4 in Appendix) at a given 
depth than we observed for Thaumarchaeota rrs genes. Clusters 
of sequences that appear to be unique to the Gulf of Mexico 
were observed in both 100 and 200 m samples (Figure AlA in 
Appendix). 

The top BLASTx hits for all but 30 of 257 sequences obtained 
from accA amplicons were to carboxylase or carboxyltransferase 
genes from Archaea. The remaining 30 amplicons were most simi- 
lar to non-Thaumarchaeota reference sequences with low (<65%) 
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FIGURE 5 I Phylogenetic analysis of Thaumarchaeota irs genes 
retrieved from Station D5. Clone libraries were generated from 
DNA in samples collected at depths of 100 m (green) and 200 m 
(blue). The Neighbor-Joining tree was constructed using ARB 



(Ludwig et al., 2004). Reference sequences in bold are from isolates 
or enrichment cultures of AOA. Bootstrap values obtained from 
resampling 1000 times; only values above 75% bootstrap support 
are shown. 
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FIGURE 6 I Phylogenetic analysis of inferred amino acid sequences 
from (A) amoA and (B) accA gene sequences retrieved from Station 

D5. Numbers beside groups (in triangles) intricate the number of sequences 
from each depth sampled according to color: clades in green are from 2, 50, 
or 100 m; clades in blue are from 200 or 450 m. Neighbor-Joining Trees were 
constructed with ARB (Ludwig et al., 2004) from sequences 199 aa (AmoA) 
or 137 aa (AccA) in length. Sequences in bold were obtained from isolates 
or enrichment cultures of AOA. Bootstrap values were obtained from 100 
resamplings; only values above 75% bootstrap support are shown. 



sequence identities. Because they did not return hits to Thaumar- 
chaeota reference sequences, we did not consider them further. 
Phylogenetic analysis of the inferred amino acid sequences for 
AccA (Figure 6B) revealed three major OTUs: OTU 1 contained 
a majority of near-surface sequences (2, 50, and 100 m), while 
OTUs 2 and 3 contained mostly sequences from deep water (200 
and 450 m). Analysis of accA nucleotide sequences revealed sim- 
ilar clusters with depth as inferred amino acid sequences for 
AccA and Thaumarchaeota rrs gene sequences (Figure AlB in 
Appendix) with a total of 51 OTUs observed at a 97% similar- 
ity cutoff (Table A4 in Appendix). Some of these seem unique 
to the Gulf of Mexico (Figure AlB in Appendix), but this may 
be an artifact of the limited representation of accA sequences in 
reference databases. 



PYROSEQUENCING: PHYLOGENETIC PATTERNS AND SAMPLE 
GROUPINGS 

Microbial community composition varied dramatically with 
depth as shown by comparisons of libraries from surface (<25 m 
depth) versus subsurface (>100m depth) samples (Figure A2 
in Appendix, Table A3 in Appendix; these data are discussed 
fuUy in King et al., 2013). Proteobacteria, especially a- and y- 
Proteobacteria, dominated the microbial community of near- 
surface waters at most stations. Consistent with distributions 
of rrs and amoA indicated by qPCR analyses, Thaumarchaeota 
were greatly enriched in deeper waters. Only 14 (out of a total 
of 221,410) rrs sequences binned to AOB, confirming the much 
lower abundance of AOB relative to AOA found by qPCR quan- 
tification of amoA. Half of the AOB sequences were retrieved 
from one sample: MRl-2m, taken upstream of the mouth of 
the Mississippi River with a salinity of 0. Only four Thaumar- 
chaeota sequences were retrieved from this sample (Table A3 in 
Appendix), two of which were most similar to the terrestrial thau- 
marchaeota, "Candidatus Nitrososphaera gargensis" strain EN76, 
at 15% similarity. 

Sequences most closely related to NOB were retrieved from 
most samples (mean = 0.4%, range 0-1.8% of prokaryotes as cal- 
culated in "Methods" in Appendix, but assuming 2 rrs per NOB 
genome from Mincer et al., 2007). These sequences were primarily 
identified as Nitrospina sp. 3005 (AMI 10965), though Nitrospira 
ribotypes were also detected. The abundance of NOB rrs was great- 
est at depth (~200 m. Table A3 in Appendix, Figure 7A) and was 
significantly correlated with the abundance of Thaumarchaeota 
in the same samples (Figure 7A; model II regression, « = 41, 
r^ = 0.49, 95% CL of slope = 0.032-0.064). Euryarchaeota only 
accounted for a few percent of the microbial community (mean 
5.8%, range 0.1-17.6%). Euryarchaeota were most abundant in 
near-surface samples (<100m; Table A3 in Appendix) and their 
abundance was poorly correlated with the abundance of Thaumar- 
chaeota (Figure 7B; model II regression, n = 41,r2 = 0.14,95%CL 
of slope = 0.021-0.20). 

UniFrac distances calculated between samples indicate signif- 
icant (p < 0.05) similarities in Thaumarchaeota rrs assemblages 
among offshore, near-surface samples and inshore, near-surface 
samples from Stations A2, A4, D3, E2, and MR2 (data not shown). 
The Station D5-100m sample was assigned to the near-surface 
group (p < 0.05) regardless of the method used to obtain rrs 
sequences (pyrosequencing versus Sanger sequencing from clone 
libraries). Among deep offshore samples, those from 160-950 m 
were similar to each other {p < 0.05); sequences from clone 
libraries generated from Station D5-200 m were also included in 
this group. The phylogenetic composition of Thaumarchaeota rrs 
in the deepest sample, Station A6-1700m, was only similar to 
samples from D5-900 m and F6-950 m (p < 0.05). 

Analysis of phylogenetic patterns across samples using PCoA 
in Fast UniFrac (Figure 9) revealed two major groups of pyrose- 
quenced Thaumarchaeota rrs - one of deep (>100m) samples 
and another including the near-surface samples (both inshore and 
offshore), which agrees with PCA groupings (Figure A3 in Appen- 
dix). The primary PCoA axis explained 70% of the variation in 
phylogenetic composition of the samples, with the secondary axis 
explaining an additional 11% (total 81%) of the variation. The 
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FIGURE 7 I Comparison of the abundance of rrs from (A) 
Nitrite-Oxidizing Bacteria; and (B) Euryarcliaeota versus 
Tfiaumarcfiaeota rrs in samples from the northern Gulf of Mexico. 

Triangles, near-surface {<100m) samples; squares, Deep (>100m) samples. 
Lines are model II regressions (Legendre and Legendre, 1998) of all data. 



sample from Mississippi River Station MRl was an outlier; how- 
ever, PCoA analysis with this sample included revealed the same 
general pattern (Figure A7 in Appendix). Samples clustered using 
the minimum resampling of 2 sequences (Figure A4A in Appen- 
dix) only showed significant separation of Station MRl sample 
from the rest of the samples (>99.9% jackknife support). For 100 
re-sampled sequences (32 of 43 samples; Figure A4B in Appendix) , 
a clear separation was observed between surface and deep samples 
(60% support) and between near-surface inshore samples (exclud- 
ing Station A4) and near-surface offshore samples (>99.9% sup- 
port). When the median number of sequences was applied to 
cluster analysis (520 sequences, 22 of 43 samples; Figure A4C in 
Appendix), the separation of deep and near-surface samples was 
statistically significant (>99.9% support). Station D3 (inshore, 
<100m depth) samples clustered most closely (>99.9% sup- 
port), followed by inshore Station A4-43 m and offshore Station 
A6-80m (95% support). Amongst deep samples, a further sep- 
aration was observed within the deep offshore samples, with the 
deepest samples (Stations D5-900 m and F6-950 m) and those 



from 350-760 m forming distinct clusters 50 and 61% of the time, 
respectively (Figure A4C in Appendix). 

DISCUSSION 
COMMUNITY COMPARISONS 

We found a strong correlation between qPCR and pyrosequenc- 
ing estimates of AOA relative abundance indicating that, despite 
potential biases associated with individual qPCR primers, qPCR 
estimates of Thaumarchaeota distributions at this coastal site are 
robust. Thaumarchaeota were abundant in deeper waters of the 
northern Gulf of Mexico, increasing in abundance with depth to a 
broad maximum between ~200 and 800 m (Figures 2 and 3), coin- 
ciding with the oxygen minimum (Figure 2). Two shallow water 
stations (CI, 12 m; MR2, 8 m) contained up to 10^ copies L^' of 
Thaumarchaeota rrs; both of these stations are near the Missis- 
sippi River Plume, which may indicate an influence of riverine 
nutrients on AOA. It is important to note, however, that these are 
marine ribotypes and not terrestrial or freshwater ribotypes car- 
ried into the Gulf by the Mississippi River, since we did not retrieve 
similar ribotypes from Mississippi River sample MRl. In contrast, 
AOB amoA genes were below the limit of detection except in a few 
near-surface samples from inshore stations (Stations CI, D3, D5, 
Gl, and HI) and in river stations MRl and MR2. Consistent with 
many other studies of amoA in coastal water columns (Wuchter 
et al, 2006; Herfort et al, 2007; Beman et al., 2010), AOA amoA was 
always > 10- to 100-fold more abundant than AOB amoA. The rela- 
tive abundance of Thaumarchaeota and AOB rrs in pyrosequenced 
libraries (Table A3 in Appendix) is consistent with the distribution 
of amoA genes determined by qPCR, suggesting that the observed 
ratio of AOA: AOB amoA is not an artifact of primer bias. Although 
we do not have ammonia oxidation rate measurements for these 
samples, the greater abundance of AOA than AOB amoA suggests 
that Thaumarchaeota are likely to dominate nitrification in this 
region (Beman et al., 2008). 

We did not quantify the distribution of NOB by qPCR (cf 
Santoro et al, 2010, which is limited to Nitrospina); however, we 
were able to determine the distribution of all known NOB rel- 
ative to Thaumarchaeota from pyrosequenced rrs libraries. We 
found that NOB abundance correlated well with that of Thau- 
marchaeota (r-^ = 0.49), as reported by others (Mincer et al., 2007; 
Santoro et al., 2010). The correlation between the distributions of 
these two groups suggests relatively tight coupling between them, 
presumably leading to efficient conversion of ammonia to nitrate 
in the northern Gulf of Mexico. However, NOB rrs abundance was 
only ~5% of that of Thaumarchaeota (slope of model II regres- 
sion; Figure 7A), in contrast to estimates of 20-100% reported by 
Mincer et al. (2007) or ~25% reported by Santoro et al. (2010). 
This ratio would change if the rrs gene dosages we used in our 
calculations changed; however, the discrepancy suggests that alter- 
native pathways, e.g., anammox, might be more significant for 
nitrite removal in the northern Gulf of Mexico than in the tem- 
perate Pacific upwelling zone sampled by Mincer et al. (2007) and 
Santoro et al. (2010). 

ENVIRONMENTAL FACTORS 

The connection between pH and AOA abundance has been exam- 
ined closely in soils, where Archaeal amoA typically dominates in 
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more acidic samples (reviewed in Prosser and Nicol, 2008; Erguder 
et al., 2009). The Mississippi River plume is a site of respiration- 
induced acidification (Cai et al., 2011), and we observed a negative 
correlation between the abundance of Thaumarchaeota rrs and 
pH in our samples. In contrast, the abundance of Archaeal accA 
genes and of AOA amoA genes detected by the Francis primers 
was positively correlated with pH values (Figure 8). AOB amoA 
abundance was positively correlated with temperature and nega- 
tively correlated with depth, while AOA amoA abundance showed 
the opposite trends (Figure 8). These correlations correspond to 
AOB abundance being greatest in surface samples, versus AOA 
abundance being greater in samples from deeper, colder water, 
as observed in other studies (e.g., Santoro et al., 2010). We also 
observed a strong negative correlation between AOB amoA gene 
abundances and salinity, but we did not find a statistically signifi- 
cant (p > 0.05) correlation between AOA amoA genes and salinity. 
This contrasts with AOA distributions reported for sediments from 
an aquifer at Huntington Beach, CA, USA (Santoro et al, 2008) or 
from the San Francisco Bay Estuary (Mosier and Francis, 2008), 
where AOB were more abundant in high salinity sediments, while 
AOA were more prominent in low salinity environments. 

Fluorescence (chlorophyll a) contributed significantly to PCI 
(Figure A3 in Appendix) and accA, pSL12 rrs, and Archaeal amoA 
gene abundance (Francis primers) were all positively correlated 
with fluorescence in CCA analysis (Figure 8). Most other stud- 
ies have reported inverse correlations between Thaumarchaeota 
abundance and chlorophyll a (Murray et al., 1999a,b; Wells anci 
Deming, 2003; Kirchman et al, 2007). A study of AOA and AOB 
dynamics in estuarine sediments, though, showed that potential 
nitrification rates and the abundance of Archaeal amoA genes 
(Wuchter primers) correlated positively with sediment chloro- 
phyll a concentrations (Caffrey et al., 2007). Archaeal abundance 
in the Arctic Ocean near the Mackenzie River mouth correlated 



positively with chlorophyll a (Wells et al, 2006), although a pre- 
vious study at similar sites showed the opposite trend (Wells and 
Deming, 2003). We observed a strong positive correlation between 
Bacterial amoA abundance and turbidity in the Gulf of Mexico 
while Archaeal amoA genes were inversely correlated with turbidity 
(Figure 8). We detected greatest abundances of AOB amoA genes 
in shallow, near-shore waters (especially at Station CI and all three 
Mississippi River stations), which may indicate a salinity effect or 
an association of AOB with particles originating from estuaries, 
coastal embayments, or the river. Since we did not sequence the 
AOB amplicons we obtained, we cannot use the phylogenetic posi- 
tion of the AOB to differentiate between these hypotheses (e.g., 
Phillips et al, 1999; O'MuUan and Ward, 2005). Caffrey et al. 
( 2007) reported that AOB were more abundant than AOA in sedi- 
ments from Weeks Bay, Alabama, a subembayment of Mobile Bay. 
Our near-shore waters also had higher ammonia concentrations 
(up to 3 (xM; data not shown) than at other stations, which is 
consistent with the conceptual model that AOB are more com- 
petitive in environments with elevated ammonia concentrations 
(Martens-Habbena et al, 2009). 

Oxygen concentrations are typically higher in surface than deep 
water, especially in this region of the Gulf of Mexico where bottom 
waters become seasonally hypoxic (Rabalais et al., 2002, 2010). 
Although samples for this study were collected before hypoxia 
had fully developed ([O2] ranged from 3.5 to 8.4mgL^^; 150- 
375 |xM), we found clades of AOA similar to those observed in 
other hypoxic waters (Beman et al., 2008; Labrenz et al., 2010; 
Molina et al., 2010). Additionally, we determined that the dis- 
tribution of amoA phylotypes detected by the Francis primers 
correlated positively with [O2] (as did Archaeal accA genes), while 
those detected by the Wuchter primers were not correlated with 
[O2] (Figure 8). Our data suggests that these primer sets have 
different PGR biases such that certain AOA ecotypes are ampli- 
fied more efficiently by one set than the other. As we observed 
correlations between different environmental variables and amoA 
phylotypes amplified by each primer, we believe these differences 
may reflect ecotype-specific sequence variation, as proposed for 
the two primer sets given in Beman et al. (2008). 

amok AND acck ABUNDANCE 

The abundance of Archaeal amoA genes reported in this study (up 
to 10^ copies L^' ) is comparable to abundances reported for other 
continental shelf regions (Galand et al, 2006; Mincer et al., 2007; 
Kalanetra et al., 2009; Santoro et al., 2010), in the mesopelagic 
Pacific Ocean (Church et al., 2010), and in hypoxic zones (Beman 
et al, 2008; Molina et al., 2010). Differences in estimates of amoA 
abundance depended on the primer set used. Previous studies 
using the Wuchter primers reported low abundance of amoA rela- 
tive to rrs in deep waters (Agogue et al., 2008; De Corte et al, 2009) 
compared to studies that used the Francis primers (Beman et al, 
2010; Church et al, 2010; Santoro et al, 2010), suggesting that 
the Wuchter primers are biased against deep water clades of AOA. 
Our study supports these conclusions, but we also found that the 
Francis primers underestimated amoA abundance relative to rrs 
in surface water samples (Figure A6 in Appendix). Comparisons 
of primer sequences to alignments of amoA sequences from this 
study show single base-pair differences within Wuchter primer 




CCAl 

FIGURE 8 I Canonical con-espondence analysis (CCA) ordination plot of 
qPCR-estimated abundances for rrs, amoA, and accj4 genes and 
environmental data. The length and angle of arrows shows the 
contribution of a particular environmental variable to the CCA axes. 
Fluorescence, relative fluorescence, chlorophyll a equivalents; beam 
attenuation, turbidity. Eigenvalues, correlation values, and percentage 
variance for CCA are given in Table A6 in Appendix. 
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binding sites that could affect primer annealing and thus amplifi- 
cation (Figure A8 in Appendix). Our findings support the use of 
two different primer sets for the quantification of Archaeal amoA 
in near-surface versus deep water samples, as recommended by 
Beman et al. (2008). Alternatively, Thaumarchaeota abundance 
in DNA extracted from our samples estimated by qPCR of rrs 
agreed well with an independent assessment based on pyrose- 
quencing. This suggests that the 334F/534R rrs primer set orig- 
inally proposed by Suzuki et al. (2000) for quantifying Marine 
Group 1 Archaea may be more robust than amoA primer sets for 
quantifying Thaumarchaeota. 

The accA gene, a proposed marker for archaeal autotrophy, 
was found at abundances almost equal to Thaumarchaeota rrs 
and amoA (amplified by the Francis primers) below 100 m depth 
(Table 2), in agreement with findings from the original accA sur- 
vey of the Tyrrhenian Sea (Yakimov et al, 2009). accA was least 
abundant in surface water samples (2-70 m depth; e.g.. Figure 2), 
especially at inshore stations and in the Mississippi River. A similar 
trend has been reported for South China Sea samples, where accA 
approached the limit of detection in samples <100m (Hu et al., 
201 1). Since the accA primers were designed using a very small 
database, the apparent discrepancy between accA and Thaumar- 
chaeota rrs abundance in near-surface samples may be due to the 
presence of populations in surface waters with divergent accA that 
are not detected by this primer set. 

COMMUNITY COMPOSITION 

We identified a number of clades that appear to be unique 
to the northern Gulf of Mexico. These were seen in rrs genes 
from both clone libraries and pyrosequencing reads (e.g., D5- 
200m-66 [KC330801], -71 [KC330804], -85 [KC330810]; D5- 
lOOm-15 [KC330788]; Figure 5), in amoA gene sequences 
(e.g., D5-100m-amoA-21 [KC349156], -35 [KC349170], -41 
[KC349176], -51 [KC349185]; D5-200m-amoA-30 [KC349251], 
-44 [KC349264]; Figure AlA in Appendix), and in accA gene 
sequences (e.g., D5-2 m-accA-05 [KC349402], -44 [KC349436]; 
D5-50m-accA-53 [KC349545];D5-100m-accA-21 [KC349333],- 
29 [KC349340],-47 [KC349355]; D5-200 m-accA-1 1 [KC349365], 
-27 [KC349380], -36 [KC349389], -41 [KC349393]; D5-450m- 
accA-20 [KC349475], -26 [KC349480]; Figure AlB in Appendix). 
Since the global distribution of accA genes has not been thoroughly 
surveyed, it is difficult to determine whether these clades are indeed 
unique to the Gulf of Mexico. Generally, the sub-populations 
of Thaumarchaeota represented by distinct OTUs of each gene 
grouped according to sample depth, with the most stringent seg- 
regation by depth observed for rrs and accA, which segregated as 
deep (200 and 450 m) and near-surface (2, 50, and 100 m) OTUs, 
as has been observed elsewhere for amoA (Francis et al., 2005; 
Beman et al, 2008, 2010; Kalanetra ct ^1 ^^^q- rh^nch et al., 
2010; Santoro et al, 2010). Archaeal amoA phylotypes retrieved 
from Station D5 were also distributed according to sample depth 
(Figure 6A), with a near-surface "Group A" and deep "Group B" 
(Francis et al., 2005). Since these distributions of each of these 
genes were determined by independent PGR amplifications, it is 
not possible to directly associate rrs, amoA, and accA genotypes 
in our samples; however, the coincident groupings of these three 
markers of completely different physiological functions suggest 
differentiation of these Thaumarchaeota populations at a genomic 



level. Unifrac analysis suggests that Thaumarchaeota populations 

at these stations resolve into three sub-populations, segregated by 
depth and by factors covarying with depth, with strongest separa- 
tion between surface (depth < 100 m) and deep water populations 
(Figure 9; Figures A4 and A7 in Appendix). 

A few of the accA gene sequences retrieved from Station D5 
clustered with previously defined ecotypes of the "Deep Water accA 
Glade" (Yakimov et al., 2009, 201 1), referred to here as Deep Eco- 
types la, lb, and 2 (Figure 6B). Inferred amino acid sequences of 
all but 8 of the 87 accA amplicons we retrieved from 200 and 450 m 
grouped into Deep Ecotype 2. No representatives of Deep Eco- 
types la or lb were identified, although a group of more divergent 
sequences similar to these ecotypes was evident (Figure 6B). Since 
previous studies concentrated on samples from deeper waters, we 
have added Near- Surface Ecotypes la and lb to the "Shallow Water 
accA Glade" (Yakimov et al, 2011). Both of the Sargasso Sea refer- 
ence sequences from this clade fit into Ecotype la, which contained 
only sequences from near-surface waters (<100m) of the north- 
ern Gulf of Mexico. The accA sequence from "Ca. Nitrosopumilus 
maritimus" SCMl (Walker et al., 2010) grouped with marine sedi- 
ment clones and with "Cfl. Nitrosoarchaeum limnia" SVBl (Blainey 
et al. , 20 1 1 ) ; we have thus allocated these sequences to a "Nitrosop- 
umilus-like group." We also note a distinct lineage of accA (OTU 2, 
"Near- Surface Ecotype lb"; Figure 6B) containing sequences from 
the northern Gulf of Mexico and the South Ghina Sea ("Shallow 
group 11" in Hu et al., 2011). The sequences we retrieved extend 
coverage of the diversity of accA environmental sequences to near- 
surface sites and provide additional references for refining ecotype 
characterizations as more sequences are added to the databases. 
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CONCLUSION 

AOA and Thaumarchaeota were abundant in the northern Gulf 
of Mesdco coastal waters we sampled, accounting for up to 40% 
(qPCR) or 54% (pyrosequencing) of the total bacterioplankton 
population and outnumbering AOB by 10- to 100-fold. The ratio 
of AOA to NOB in our samples was lower than reported in other 
studies, suggesting that other pathways for nitrite oxidation may 
be more important in the northern Gulf of Mexico than elsewhere. 
A diverse community of Thaumarchaeota was observed at Station 
D5 near the Mississippi River plume in clone libraries constructed 
from archaeal genes of interest (rrs, amoA, and accA), with clades 
that seem to be unique to waters of the northern Gulf of Mex- 
ico. Consistent with this observation, and in contrast to studies 
of many other coastal waters, the amoA sequence most similar to 
Nmar_ 1 500, the amoA gene from " Ca. N. maritimus" strain SCM 1 , 
was only 91% similar. Through analysis of rrs sequences generated 
using 454 pyrosequencing, we observed distinct clades of Thau- 
marchaeota that were distributed primarily by depth, with clear 
differences between near-surface (<100m) and deep (>100m) 
populations. The distribution of rrs sequences in clone libraries 
generated from samples collected at Station D5 was consistent with 
this pattern, suggesting that parallel differences in the composition 
of Thaumarchaeota populations defined by other genes observed 



at this station were applicable to the rest of the northern Gulf 
of Mexico. Finally we found correlations between abundances of 
Thaumarchaeota genes in this region and environmental vari- 
ables depth, temperature, turbidity, pH, and oxygen; however, 
the manner in which these variables influence Thaumarchaeota 
metabolism and thus distribution remains unclear. 
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APPENDIX 
METHODS 
qPCR standards 

Standards for qPCR reactions were constructed as in Kalanetra 
et al. (2009). Briefly, environmental DNA was amplified using 
gene-specific sequencing primers (Thaumarchaeota rrs, Archaeal 
amoA, Bacterial amoA) or qPCR primers (accA) under standard 
PGR conditions. For Bacterial rrs qPCR, E. colt genomic DNA 
was used. The resulting PGR product was loaded onto an agarose 
gel, electrophoresed, and a band of expected product size was 
excised. This band was purified using the QIAquick® Gel Extrac- 
tion Kit (QIAGEN) and cloned into E. colt TOPIO chemically 
competent cells after insertion into a TOPO 4 vector (Invitro- 
gen) using the manufacturer's instructions. Glones were selected at 
random and sequenced to check insert specificity. Those with pos- 
itive insertions were grown overnight in LB broth with ampicillin, 
and plasmids were extracted using the QIAprep Spin Miniprep 
Kit® (QIAGEN). Plasmids were linearized using the restriction 
enzyme Notl (New England Biolabs), then purified in the same 
manner as PGR products above. Goncentrations of linearized 
plasmid DNA were measured with the Quant-iT™ PicoGreen® 
dsDNA reagent (Invitrogen) using a Picofluor handheld fluorom- 
eter ( Turner Designs ) . Gene concentration calculations were based 
on measured DNA concentrations, plasmid length, and insert 
sequence length. Standards were then diluted to a range of 10^-10^ 
copies [lL^^ for each reaction. 

Thaumarchaeota hcd gene assay 

In addition to accA, another gene in the 3-hydroxypropionate/4- 
hydroxybutyrate pathway, hcd, encoding the enzyme 4- 
hydroxybutyryl-GoA dehydratase, has been suggested as a poten- 
tial marker for carbon fixation in Thaumarchaeota (Offre et al., 
2011). Primers for this gene have been developed and tested on 
soil Thaumarchaeota populations (Offre et al, 201 1). We explored 
using these primers to quantify hcd abundance in our samples. We 
were unable to obtain the desired amplification specificity with 
these primers and our samples (determined by agarose gel elec- 
trophoresis then cloning and sequencing putative amplicons, see 
below). 

Gene abundance and ratio calculations 

The number of gene copies detected by qPGR (copies per reaction) 
was converted to environmental concentrations (copies ^ ) using 
the original sample volume filtered ( ~ 1 L) , the portion of the lysate 
purified (800 of 2000 (iL), the final volume of the purified extract 
(50 |xL, we also measured DNA concentration in this extract), 
and the portion of the purified DNA extract used in each qPGR 
reaction (2 |xL). This calculation assumes that all bacterioplank- 
ton cells were collected on the filter, that the DNA contribution 
from eukaryotes was negligible, that all of the DNA from all of 
the cells collected on each filter was released into the lysate, then 
extracted and purified from the lysate and detected by qPGR with 
100% efficiency by our methods (see discussion and calibration in 
Kalanetra et al, 2009). The contribution of Thaumarchaeota to the 
prokaryotic population was estimated from Thaumarchaeota and 
Bacteria rrs abundance by assuming 1.8 rrs per Bacteria genome 
(Biers et al., 2009), 1.0 rrs per Thaumarchaeota genome (IMG 



database) or 2.0 rrs per NOB genome (Mincer et al, 2007). Thau- 
marchaeota abundance was then divided by the total prokaryotic 
abundance (Bacteria plus Thaumarchaeota; Euryarchaeota were 
present in some samples but were never abundant, see below, 
and were not measured by qPGR) to calculate the contribution 
of Thaumarchaeota cells to the prokaryotic community. Ratios of 
gene abundance in a given sample were calculated directly from 
the qPGR data (copies |xL^' of extract). 

BEST analysis 

BEST analysis was performed for all samples collected in addition 
to the subset of samples for which nutrient data were avail- 
able. Nutrient data were collected by researchers interested in 
modeling phytoplankton growth and thus were only available 
for near-surface samples. For these samples, gene abundances 
were log-transformed and resemblance distances for each gene 
between samples were calculated using Bray-Gurtis similarity; 
resemblances for environmental data were calculated using the 
Euclidean distance. The resultant similarity matrices were com- 
bined and analyzed with Biota and/or Environment matching 
(BioEnv) through the BEST (Clarke, 1993) procedure in PRIMER 
(Clarke and Gorley, 2006). The significance of BEST results for 
each gene was tested using 999 permutations, and the null hypoth- 
esis of no species-environment relationship was rejected for all 
results with p< 0.001. 

RESULTS 
Gene ratios 

Ratios of archaeal flmoA:Thaumarchaeota rrs ranged from 0.001 
(B5-760m) to 6.6 (Gl-15m) when using the Wuchter primers 
to quantify Thaumarchaeota amoA (Table 2; Figure A5B). Low 
ratios of amoA:rrs seemed to coincide with deep (>100m) sam- 
ples (Table 2; Figure ASA). In contrast, ratios of amoA:rrs 
ranged from 0.002 to 1.9 with an average of 0.5 when Thau- 
marchaeota amoA abundance was estimated using the Francis 
primers. The Francis primer set detected more amoA genes below 
200 m depth, sometimes up to 1000 times more than the Wuchter 
primer set (Figure A6). In contrast, estimates of amoA abundance 
in near-surface (<100m) samples using the Wuchter primers 
were 10 to 100-fold greater than estimates based on the Francis 
primers (Figure A6). Ratios of flccA:Thaumarchaeota rrs ranged 
from 0.0002 to 1.3 (Table 2). We detected the fewest copies of 
accA per Thaumarchaeota rrs in near-surface (<100m) samples 
(Figure A5B). 

Thaumarchaeota hcd genes 

hcd PGR products were also obtained using the primer set from 
Offre et al. (20 1 1 ) . However, the hcd primers yielded three bands of 
~200, ~350, and ~400 bp by agarose gel electrophoresis. Analysis 
of sequences from the ~200 bp band indicated non-specific ampli- 
fication, so these sequences are not considered further. Sequences 
from the ~350 and ~400 bp bands were most similar to hcd from 
Thaumarchaeota (BLASTx to the RefSeq database). Since non- 
specific amplification prevented reliable qPGR quantification of 
hcd in our samples, we did not pursue this marker further. The 
sequences obtained from ~350 and ~400 bp bands have been sub- 
mitted to GenBank (NGBI) under accession numbers KG409223 
to KC409237. 
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Community composition 

As expected, phylogenetic analysis of amoA nucleotide sequences 
(Figure AlA) revealed more diversity than was apparent in 
inferred amino acid sequences, with 47 OTUs (97% similarity 
cutoff; Table A4) identified in the 100 and 200 m samples from 
Station D5. Seventeen of the 47 amoA OTUs only contained 
sequences from 100 m (1-22 sequences in each OTU), while 23 
OTUs only contained sequences from 200 m (1-8 sequences in 
each). The amoA sequence most similar to either "Candidatus 
Nitrosopumilus tnaritimus" strain SCMl or to Nitrosopumilus sp. 
NM25 was obtained from the 100 m library, and it was only 91% 
similar to either sequence. 

The accA nucleotide alignment contained 51 OTUs (97% 
similarity cutoff; Table A4) that clustered primarily by depth 
(Figure AlB). Sequences from deep samples (200 and 450 m) were 
assigned to 26 OTUs (1-13 sequences in each); only 4 of these 
OTUs contained any near-surface (2, 50, or 100 m) sequences. 
Twenty-one of the OTUs contained sequences exclusively from 
near-surface (<100m) samples (1-34 sequences in each). Almost 
half (101) of the sequences we retrieved were at least 77% similar 
to accA from "Co. N. tnaritimus" strain SCMl; all of these were 
retrieved from near-surface waters except for six sequences from 
200 m. 

BEST analysis 

Gene abundances determined by qPCR were compared to environ- 
mental data using the BEST procedure (Clarke, 1993). Results of 
this analysis (Table A7) show that abundances of Bacterial amoA, 
Archaeal accA, and pSL12 rrs - but not Bacterial rrs - were signif- 
icantly correlated with fluorescence (chlorophyll a). Abundances 
of both Thaumarchaeota and Bacteria rrs were correlated with 
beam attenuation (turbidity), in combination with salinity and 
either fluorescence or temperature. Archaeal amoA abundance 
correlated with latitude, fluorescence, and salinity. Interestingly, 
BEST analysis (Table A7A) showed that amoA abundance esti- 
mates obtained using the Wuchter et al. (2006) primers correlated 
with temperature (p = 0.442; p < 0.001), while amoA abundance 
estimated with the Francis et al. (2005) primers correlated with 
oxygen concentration (p = 0.474; p < 0.001). 

Nutrient data (including nitrite, nitrate, ammonia, phosphate, 
and silicate; provided by S. Lohrenz) were only available for near- 
surface samples. Gene abundances for Bacterial rrs, pSL12 rrs, and 
Archaeal amoA amplified with Wuchter primers (Table A7B) cor- 
related with silicate in combination with other variables, although 
only the Bacterial rrs result was significant {p < 0.001). Only the 
results with the highest Spearman's rank correlation coefficient ( p) 
are shown in Table A7B; however, weaker correlations to nutri- 
ents were found with the second highest result. Archaeal amoA 
amplified with Francis primers (p = 0.446; p< 0.010; data not 
shown) and Bacterial rrs (p = 0.583; p < 0.001; data not shown) 
were correlated with nitrate, while Thaumarchaeota rrs (p = 0.409; 
p < 0.018; data not shown) also correlated with nitrite and silicate 
together. 

DISCUSSION 
Community composition 

Almost all of the near-surface (<100m) Thaumarchaeota rrs 
sequences were >98% similar to the rrs from "Ca. N. maritimus" 



strain SCMl, as well as to Nitrosopumilus sp. NM25, retrieved 
from sand taken from a Zostera seagrass bed (Matsiitani et al, 
2011). The group containing these sequences included a sequence 
retrieved from cloned PCR amplicons sequenced from a tidal creek 
(the Duplin River) adjacent to Sapelo Island, Georgia (HoUibaugh 
et al, 2011), as well as "Ca. Nitrosoarchaeum litnnia" strain SFBl 
(Blainc)' et al., 20 11), which was enriched from a sample taken in 
the oligohaline reach of North San Francisco Bay. This contrasts 
with clones recovered from 200 m in the northern Gulf of Mex- 
ico, where sequences were distributed among 9 OTUs, indicating 
a richer community of Thaumarchaeota (agreeing with the Shan- 
non index of these samples calculated from pyrosequencing data; 
Table A4). We did not recover any clones related to the pSL12- 
like clade at Station D5, which is consistent with their low rrs 
abundance as estimated by qPCR. 

A nucleotide alignment of accA genes from this study produced 
a phylogenetic tree (Figure AlB) that supported the groupings 
found in trees generated fi'om inferred amino acid alignments 
(Figure 6B); however, some samples from Station D5 (mostly from 
200 m depth) clustered with representatives from Deep Ecotype la 
(^ al;lni( -v ' ^ ' ) at the nucleotide level. Additionally, a novel 
deep cluster of sequences from the Gulf of Mexico and the South 
China Sea was identified ("Deep Ecotype 3"; Figure AlB). 

Gene ratios 

High ratios of flmoA:Thaumarchaeota rrs genes at certain stations 
(Figure ASA; Table 2) could indicate a population of AOA with 
multiple amoA copies per genome or the presence of a group 
of Archaea that are not detected by the rrs primer set we used 
(e.g., Beman et al., 2008; Teske and Sorensen, 2008), but that 
contain a homolog of the amoA gene (for example the pSL12- 
like clade). The latter seems less likely for pSL12 in particular, 
given the low abundance of rrs from this group at most sta- 
tions in the northern Gulf of Mexico. However, in the Mississippi 
River at station MRl (salinity of 0), the abundance of pSL12 rrs 
genes was equal to Archaeal amoA gene abundance, regardless 
of the amoA primer set used, while Thaumarchaeota rrs genes 
were imdetectable. Low ratios of amoA-.rrs have been proposed to 
indicate a potential for heterotrophy in Thaumarchaeota (Agogue 
et al., 2008; De C< - - -.' Kalaiietra et al, 2009); however, 

this has yet to be confirmed definitively and may simply reflect 
depth-dependent shifts in sub -populations that affect our ability 
to quantify them by qPCR, as shown by Beman et al. (2008) and 
others. 

Our data indicate that the ratio of amoA-.rrs gene abundance 
decreases with depth; however, we also observed increases in the 
accA-.rrs ratio for deeper waters (Figure A5B). The amoA:accA:rrs 
ratios we found are not consistent with the expected 1:1:1 ratio 
found in the "Ca. N. maritimus" strain SCMl genome (Walker 
etal.,2010). In samples < 100 m, this ratio is 1.8:0.1:1 or 0.5:0.1:1, 
while deeper samples show 0.2:0.6:1 or 0.6:0.6:1 depending on 
whether the Wuchter or Francis amoA primer sets were used 
(Table 2). In deeper waters where Thaumarchaeota rrs are most 
abundant, using the Francis primers produces ratios most similar 
to those found in "Ca. N. maritimus" strain SCMl. Direct com- 
parison of amoA abundances in our samples as determined by the 
Wuchter versus Francis primer sets (Figure A6) demonstrate this 
clearly. 
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Table Al | Primers used in this study. 



Target gene* 



Primer/probe* 



Sequence (5'-»-3') 



Application 



Detection limit qPCR efficiency Reference 



Archaeal rrs 



Thaumardnaeal rrs 



pSL12 rrs 



Bacterial rrs 



Universal rrs 



Archaeal amoA (W+) 
Archaeal amoA (F+) 
Bacterial amoA 

Archaeal accA 
Archaeal hcd 



21F 
958R 

G1_334F 
G1_554R 
TM519AR 

pSL12_750F 
pSL12_876R 

BACT1369F 
PROK1492R 
TM1389F 

515F 

806R* 



Arch-amoA-for 
Arch-amoA-rev 

ArchamoAF 
ArchamoAR 

amoA-1 F 
amoA-r New 



Crena_529F 
Crena_981 R 

hcd^eSF (S) 
hccl-911F (Q) 
hcd-1267R (S,Q) 



TTCCGGTTGATCCYGCCGGA 
YCCGGCGTTGAMTCCAATT 

AGATGGGTACTGAGACACGGAC 

CTGTAGGCCCAATAATCATCCT 

TTACCGCGGCGGCTGGCAC 

G GTCC RCCAGAACGCG C 
GTACTCCCCAGGCGGCAA 

CGGTGAATACGTTCYCGG 

GGWTACCTTGTTACGACTT 

CTTGTACACACCGCCCGTC 

GCCTTGCCAGCCCGCTCAG 

GTGTGCCAGCMGCCGCGGTAA 

GCCTCCCTCGCGCCATCAGNN 

NNNNNNNNNNGGGGACTACV 

SGGGTATCTAAT 

CTGAYTGGGCYTGGACATC 
TTCTTCTTTGTTGCCCAGTA 

STAATGGTCTGGCTTAGACG 
GCGGCCATCCATCTGTATGT 

GGGGTTTCTACTGGTGGT 
CCCCTCBGSAAAVCCTTCTTC 



PGR and 
sequencing 

qPGR 



qPCR 



qPCR 



qPCR 

qPCR and 
sequencing 

qPCR 



GCWATGACWGAYTTTGTYRTAATG qPCR and 

TGGWTKRYTTGCAAYTATWCC sequencing 

GGHGGTGCWATGACTGAT PGR, qPCR, 

AGCTATGTBTGCAARACAGG sequencing 
CTCATTCTGTTTTCHACATC 



N/A 

4.08 X 10^ 
copies 



1.07 X 10'* 
copies 

1.14x 10" 
copies 



pyrosequencing N/A 



1.44 x 10"* 
copies 

1.79 X 10* 
copies 

1.63 X 10* 
copies 



1.25 X 10* 
copies 



N/A 



96.5-112.7% 



96.9-103.1% 



91.6-113.2% 



N/A 



95.1-103.6% 



88.4-96.0% 



88.4-95.6% 



80.5-91.1% 



N/A 



N/A 



DeLong 
(1992) 

Suzuki et al. 
(2000) 
Suzuki et al. 
(2000) 
Mincer 
et al. (2007) 

Suzuki et al. 
(2000) 

King et al. 
(2013) 



Wuchter 
etal. (2006) 

Francis 
et al. (2005) 

Rotthauwe 
etal. (1997) 
Hornek 
etal. (2006) 
Yakimov 
et al. (2009) 

Off re et al. 
(2011) 



'rrs, IBS rRNA gene; amoA, ammonia monooxygenase gene, Bacteria amoA primers only amplify amoA genes from (i-Proteobacteria; accA, biotin-dependent acetyl- 
CoA/propionyl-CoA carboxylase gene; hcd, 4-hydroxybutyryl-CoA dehydratase. 
"(S>, Sequencing, (Q), qPCR, TM, TaqMan Probe. 

+ Archaeal amoA (W) and (F) refer to Wuchter et al. (2006) and Francis et al. (2005) primer sets, respectively (as mentioned in text). 
' For primer 806FI, N's in sequence = barcode sequence region. 
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Table A2 | qPCR estimates of the abundance of rrs, amoA, and accA genes in samples from the northern Gulf of Mexico. 



Station 


Depth 


Amount 


Wuchter 


Francis 


Arch accA 


Thaum 


pSL12 


AOB 


Bacteria 


Thaunri% 


ID 


(m) 


filtered 


Arch 3moA 


Arch stnoA 


ponip^/L 


rrs 


rrs 


amoA 


rrs 


of total 






(L) 


copies/L 


copies/L 




copies/L 


copies/L 


copies/L 


copies/L 


pro lory a 


A9 
MZ 


1 Q 


1 n 
l.U 


INIU 


Kin 

NU 


Kin 

InU 


R QRC _i_ nR 
O.oOt + UO 


1. /zt -j- UZ 


NU 


A Rnc _i_ nQ 

4. out + Uo 


n 99 

U.Zo 


AA 


1 7 
1 / 


I.Z 


Kin 

InU 


NU 


Kin 

NU 


9 nQC _i_ nR 

Z.Uot + UD 


NU 


MPl 
IMU 


Q 7nc _i_ nQ 
o. /Ut + Uo 


n AO 

U.4o 


A/1 


"4-0 


I.Z 


Kin 

InU 


INU 


Kin 

InU 


9 9nc _i_ nR 

Z.ZUt + UD 


Kin 

InU 


MPl 
IMU 


A 99 C _l_ nQ 
4.oZt + Uo 


n 01 
u.y 1 


AO 


Z 




Z .oZt + UD 


Q OOP _i_ nR 
o.oZtl -|- uo 


O.H-OL + UZ 


9 R9P _i_ nR 
Z.DZl -|- Ud 


/! R/1P_i_ n7 


0 VQP ^ no 
z. /yt + cz 


R 07P _i_ no 
O.Z / 1 -|- Uo 


n Qo 
u.csy 


AR 
AO 


zu 




1. 1 oti -|- u / 


Q 9/1 P _i_ nK 
O.Z4tI -|- Uo 


1 99 P _i_ OR 
Loot: -|- Uo 


Q R9P _i_ nR 

y.ozt -|- UD 


1 79/^_i_ n9 


R oop^ no 

O.ZOL -\- Uo 


1 no P _i_ no 
i.uz t -|- uy 


1 RR 
I.DD 
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ND, Not determined; qPCR abundance undetectable for this specific gene in this sampie. 

LD, Limit of detection: sample ran below limit of detection with high variability in assay. 

'Note that some values shown below the limit of detection (italicized) by our assay are included here because these values had low standard deviation in replicates. 
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Table A4 | Diversity indices for sequenced clones obtained from 
Station D5 calculated using mothur (v. 1.21.1; Scliloss et al., 2009). 





Observed 


Chao 


ACE 


Sliannon 


Simpson 


accA 


51 


74.0 


109 


3.02 


0.0862 


amoA 


47 


86.4 


107 


3.19 


0.0615 


amoA 100m 


22 


35.2 


55.7 


2.39 


0.139 


amoA 200m 


30 


39.4 


41.8 


2.89 


0.0801 


rrs 


10 


11.0 


12.8 


1.30 


0.451 


ITS 100m 


2 


2.00 


0.000 


0.103 


0.957 


rrs 200m 


9 


10.0 


11.8 


1.96 


0.141 


rrs 454 


2768 


18700 


57100 


4.15 


0.0654 



OTU similarity cutoffs were 2% (rrs) or 3% (amoA, accA). Statistics for rrs 
sequences ("rrs 454 ") obtained from pyrosequencing are included for compari- 
son. 



Table A5 | Variables contributing to principal components axes. 



Variable 


PCI 


PC2 


Latitude CN) 


-0, 


.26 


+0.30 


Longitude ("W) 


+0, 


.066 


+0.24 


Deptli (m) 


+0, 


.39 


+0.21 


Temperature CO 


-0 


.36 


-0.38 


Salinity (PSU) 


+0, 


.19 


-0.49 


Dissolved oxygen (mg/L) 


-0, 


.36 


-0.12 


Rel. fluorescence (ng/L) 


-0, 


.39 


+0.15 


Beam attenuation (1/m) 


-0, 


.21 


+0.44 


pH (NBS) 


-0, 


.42 


-0.32 


Euphotic depth (m) 


+0, 


.31 


-0.30 



Coefficients (values) are a measure of contribution of each variable to each of 
the principal component axes (PC1 and PC2) such that the higher the value, the 
greater the influence of the variable. A positive or negative sign represents the 
type of correlation each variable has on each axis. The total amount of variance 
explained by PCI was38.9% and 24.3% forPC2. Depth, watercolumn depth; Rel. 
Fluorescence, Relative fluorescence, chlorophyll a equivalents; beam attenuation, 
turbidity: euphotic depth, photic zone depth. 



Table A6 | Results of CCA analysis of relationship between 
qPCR-estimated gene abundances and environmental data in the 
northern Gulf of Mexico. 

Axes CCA1 CCA2 CCA3 CCA4 



Eigenvalues 0.148 0.135 0.012 0.008 

Gene-environment correlations 0.655 0.765 0.352 0.230 
CUMUI^^^^^^^^^^^^^^IH^^I^^^^^I 

Of gene abundance data 170 32.6 34.0 34.9 

Of gene-environment relation 479 91.7 95.7 98.3 

Values for all four canonical axes are shown, but only CCA 1 and CCA2 were used 
to construct a biplot of the data (Figure 8). 
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Table A7 | Results of BEST analysis comparing gene abundance to (A) environmental factors and (B) environmental factors and nutrients (only 
near-surface samples used with nutrients). 



Gene 


Number of variables 


Correlation (p*) 


Contributing environmental variables 
















Thaumardnaeota rrs 


3 


0.507 


Salinity, RF, beam attenuation 




pSL12 rrs 


1 


0.457 


RF 




Bacterial rrs 


3 


0.613 


Temperature, salinity, beam attenuation 




Archaeal amoA\N 


4 


0.442 


Latitude, temperature, salinity, RF 




Archaeal amoA F 


4 


0.474 


Latitude, salinity, oxygen, RF 




Bacterial amoA 


1 


0.462 


RF 




Arclnaeal accA 


1 


0.460 


RF 




Gene 


Number of variables 


Correlation (p*) 


p-Value"*" 


Contributing environmental 










variables and nutrients 




Thaumarchaeota rrs* 


1 


0.429 


0.018 


Beam attenuation 


pSL12 rrs 


5 


0.337 


0.102 


Latitude, salinity, RF nitrate, silicate 


Bacterial rrs* 


2 


0.587 


0.001 


Beam attenuation, silicate 


Archaeal amoA\N 


3 


0.374 


0.067 


Latitude, RF silicate 


Archaeal amoA F* 


2 


0.374 


0.010 


Latitude, RF 


Bacterial amoA 


2 


0.264 


0.269 


Latitude, RF 


Archaeal accA 


2 


0.269 


0.246 


Latitude, RF 



BEST analysis (Clarke, 1993) performed with PRIMER v6 software (Clarke and Gorley, 2006). Archaeal amoA W, amplified with Wuchter et al. (2006) amoA primer 

set; Archaeal amoA F, amplified with Francis et al. (2005) amoA primer set; RF relative fluorescence (chlorophyll a equivalents); beam attenuation, turbidity. 

•p in section A is the Spearman rank correlation coefficient where p > 0 rejects the null hypothesis; all results had a significance p< 0.001, determined from 999 

permutations. 

•p in section B is the Spearman rank correlation coefficient where p> 0 rejects the null hypothesis; p-values are given. 

*p is the significance of the result, determined from 999 permutations. 

'p<0.05. 
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FIGURE A1 I Phylogenetic analysis of (A) amoA and (B) accA genes 
retrieved from Station D5. Neiglibor-Joining Trees built witli ARB (Ludwig 
et al., 2004) from nucleotide sequences 595 bp {amoA) or 411 bp {accA) in 



length. Sequences in bold obtained from isolates or enrichment cultures. 
Bootstrap values obtained from resampling tree 1,000 times; only values 
above 75% bootstrap support shown on tree. 
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FIGURE A2 I Distribution of various taxa in pyrosequenced libraries of rrs in samples from the northern Gulf of IVIexico: (A) samples taken from 
depths <25 m; (Bj samples from depths >100 m. 
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FIGURE A3 I Principal components analysis (PCA) of samples using 
environmental variables. Fluorescence, relative fluorescence, chlorophylla 
equivalents; beam attenuation, turbidity. Samples are shown as symbols 
representing three groupings based on depth and location: orange 
circles = near-surface inshore (<100 m depth, over the continental shelf), 
green diamonds = near-surface offshore (<100 m depth, shelf break and 
beyond), and blue squares = deep offshore (>100m, shelf break and 
beyond). 
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FIGURE A4 I Continued 



Frontiers in Microbiology | Aquatic Microbiology 



April 2013 I Volume 4 | Article 72 | 32 



Tolar et al. 



Thaumarchaeota in the Gulf of Mexico 



Scale: 1 dash ~ 0.0006 branch length units 



+ 1+ > IS#A«.43m 

I 1+ > OS#A6.20m 

(I 1++ > OS#A6.2m 
> OS#C4.2m 
l| + > OS#D5.50m 
1+ > 0S#A6 . aom 

I I > os#D5 . 2m 

+ > OS#D5.100lll 

+ 1 > IS#D3 .25m 

I I > is#D3 . 68m 

+ > IStE2 . 6m 

_+ > 0D#A6 .160m 

+ > OD#F6.200m 

_+ > OD#C4.200m 



I + > 0DtE6 . 200m 

+ I > OD#B4.200m 

I > OD#H6.2B0m 

+ > 0D#B5 .200m 

-+ + + + — + > 0D#A6 . 350m 

I + > OD#D5.350m 

++-+-+ > 0D#A6 .700m 

I I + > OD»C4.700m 

I + > ODjtD5.450m 

+ > 0D#B4 . 530m 

+ > OD#E6.a00m 

+ + > OD#B5.450m 

+ > 0D#B5 .7 60m 

+ 1 > 0D#D5 . 900m 

I > OD#F6.950m 

+ > 0D#H6 .110m 

+ > OD#MR3.110m 



Scale: 1 dash ~ 0.0004 branch length units 




> 


IS#A4 


43m 


> 


0S#A6 


80m 


> 


0S#D5 


100m 


> 


IS#D3 


25m 


> 


IS#D3 


68m 


> 


0D#A6 


160m 


> 


0D#F6 


200m 


> 


0D#C4 


200m 


> 


0D#E6 


200m 


> 


0D#A6 


350m 


> 


0D#D5 


350m 


> 


0D#A6 


700m 


> 


0D#C4 


700m 


> 


0D#B4 


S30m 


> 


0D#B5 


450m 


> 


0D#B5 


7 60m 


> 


0D#B4 


200m 


> 


OD#He 


2aOm 


> 


0D#B5 


200m 


> 


0D#H6 


110m 


> 


0D#D5 


900m 


> 


OD#Fe 


950m 



FIGURE A4 I Jackknife clustering analysis of pyrosequenced 
Tliaumarchiaeota rrs genes using Fast UnlFrac (Hamady et al., 2009). 

Resampling of (A) 2 (minimum; n = 43 samples), (B) 100 (first quartile; 
n = 32 samples), or (C) 520 (median; n = 22 samples) sequences were 
performed for each of 100 iterations of the jackknife analysis. Colors 



indicate the percentage of iterations supporting a given node - red 
(>99.9%), yellow (90-99.9%), green (70-90%), blue (50-70%), or gray 
(<50%). Sample groups are indicated as: IS# = inshore, near-surface; 
0S# = offshore, near-surface; 0D# = offshore, deep; with # indicating the 
sample as Station. Depth. 
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FIGURE A5 I Scatter plot of Archaeal (A) amoA and (B) accA versus rrs SCMl genome (Walker et al., 2010); dashed lines indicate ratios of 0.1 or 
gene abundance in samples from the northern Gulf of Mexico. Solid 10. "Inshore," over the continental shelf; "offshore," shelf break and 

ines indicate the 1:1 ratios expected from the " Ca. N. mahtimus" strain beyond. 
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FIGURE A7 I Principal coordinates analysis (PCoA) of Tliaumarcliaeota 
rrs sequences obtained through 454 pyrosequencing of 41 samples 
and clone libraries generated from two depths at Station D5. Shapes 
indicate sample groupings: dark gray squares = deep, offshore; open 
triangles = near-surface, offshore; light gray circles = near-surface, inshore. 
The percentage of the variance explained by an axis is given in parentheses 
next to the axis title. 



FIGURE A6 I Comparison of Archaeal amoA gene abundance estimated 
by qPCR reactions with primers from Wuchter et al. (2006) or Francis 
et al. (2005). (A) Profiles of amoA abundance at Stations A6 and B5 
obtained using each primer set. ♦ = Wuchter, • = Francis. (B) Abundance 
of amoA genes estimated using Francis primers versus abundance 
estimated using the Wuchter primers. "Deep," >100 m sample depth; 
"near-surface," = < 100 m sample depth; "inshore," = above continental 
shelf; "offshore," = shelf break and beyond. (A) Station profiles of amoA 
quantified with different primer sets (B) amoA quantified by Wuchter et al. 
(2006) primers versus Francis et al. (2005) primers. 
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ITAATGGTCTGGCTTAGACG 
ITAATGGTCTGGCTTAGACG 
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ITAATGGTCTGGCTTAGACG 
GTAATGGTCTGGCTTAGACG 
ITAATGGTCTGGCTTAGACG 
GTAATGGTCTGGCTTAGACG 
ITAATGGTCTGGCTTAGACG 
ITAATGGTCTGGCTTAGACG 
GTAATGGTCTGGCTTAGACG 
GTSpVTGGTCTGGCTTAGACG 
ITAATGGTCTGGCTTAGACG 
ITAATGGTCTGGCTTAGACG 
STAATGGTCTGGCTTAGACG 
GTAATGGTCTGGCTTAGACG 
GTAA|GGTCTGGCTTAGACG 



Wuditer(F) 

CTGAYTGGGCYTGGACATC 

CTGATTGGGClTGGACATC 
CTGATTGGGCITGGACATC 
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CTGATTGGGCITGGACGTC 
CiGATTGGGClTGGACATC 
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CIGAITGGGCITGGACATC 
ClGAlTGGGClTGGACATC 
CTGAlTGGGClTGGACATC 
CTGAlTGGGClTGGACATC 
CIGAITGGGCITGGACATC 
CTGATTGGGClTGGACATC 
CTGATTGGGClTGGACATC 

tt 



Wuditer(R) 

TACTGGGCAACAAAGAAGAA 

TATTGGGClACAAAGAAGAA 
TATTGGGCGACAAAIAAGAA 
TATTGGGCGACAAAGAAIAA 
TATTGGGCGACAAAGAAGAA 
TATTGGGCGACGAAGAAGAA 
TATTGGGClACAAAGAAGAA 
TATTGGGCGACAAAGAAGAA 
TATTGGGCGACAAAGAAGAA 
TATTGGGCIACIGAIAAGAA 
TATTGGGCGACGAAGAAGAA 
TATTGGGClACGAAGAAGAA 
TATTGGGClACAAAGAAGAA 
TATTGGGClACGAAGAAGAA 
TATTGGGCGACAAAGAAGAA 
TATTGGGCGACAAAGAAGAA 
TATiGGGClACGAAGAAGAA 
TAIIGGGCIACAAAGAAGAA 
TAlTGGGCGACAAAGAAGAA 




Frands(R) 

ACATACAGATGGATGG-CCGC 

ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
lACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGGGCCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGG-CCGC 
ACATACAGATGGATGG CCGC 
ACATACAGATGGATGGSICCGC 
ACATACAGATGGATGGGCCGC 
ACATACAGATGGATGG CCGC 



FIGURE AS I Mismatches between amoA primer sequences and 
environmental sequences retrieved from samples tal<en in tlie 
northiern Gulf of IVIexico. Sequences shown were collected from 
Station D5, 200 m depth, and were trimmed to regions 



complementary to the Wuchter et al. (2006) and Francis et al. (2005) 
primer sets. The top line represents the consensus sequence, while 
arrows indicate key differences between environmental and primer 
sequences. 
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